Learning Better Monolingual Models with Unannotated Bilingual Text
نویسندگان
چکیده
This work shows how to improve state-of-the-art monolingual natural language processing models using unannotated bilingual text. We build a multiview learning objective that enforces agreement between monolingual and bilingual models. In our method the first, monolingual view consists of supervised predictors learned separately for each language. The second, bilingual view consists of log-linear predictors learned over both languages on bilingual text. Our training procedure estimates the parameters of the bilingual model using the output of the monolingual model, and we show how to combine the two models to account for dependence between views. For the task of named entity recognition, using bilingual predictors increases F1 by 16.1% absolute over a supervised monolingual model, and retraining on bilingual predictions increases monolingual model F1 by 14.6%. For syntactic parsing, our bilingual predictor increases F1 by 2.1% absolute, and retraining a monolingual model on its output gives an improvement of 2.0%.
منابع مشابه
Effective Bilingual Constraints for Semi-Supervised Learning of Named Entity Recognizers
Most semi-supervised methods in Natural Language Processing capitalize on unannotated resources in a single language; however, information can be gained from using parallel resources in more than one language, since translations of the same utterance in different languages can help to disambiguate each other. We demonstrate a method that makes effective use of vast amounts of bilingual text (a....
متن کاملLearning a second language and working memory: the role of bilinguali
the purpose of the present study was to evaluate the function of working memory in bilingual, monolingual children and children with learning disorder. The research project was of comparative causality type. Participants included 60 monolingual children, 34 children with learning disabilities and 62 bilingual children. Which completed the Wechsler Intelligence Scale and preschool children's Wec...
متن کاملمقایسه تغییر توجه و خلاقیت در کودکان دوزبانه و تکزبانه
Objective Several studies have reported that bilingualism may affect cognitive processes. Second language acquisition takes place in a variety of ways. However, considering the fact that language training courses provided by institutes are expanding at a blistering pace, the effects of foreign language learning through the medium of language schools deserves a separate line of investigation in ...
متن کاملUsing Large Monolingual and Bilingual Corpora to Improve Coordination Disambiguation
Resolving coordination ambiguity is a classic hard problem. This paper looks at coordination disambiguation in complex noun phrases (NPs). Parsers trained on the Penn Treebank are reporting impressive numbers these days, but they don’t do very well on this problem (79%). We explore systems trained using three types of corpora: (1) annotated (e.g. the Penn Treebank), (2) bitexts (e.g. Europarl),...
متن کاملSMT Helps Bitext Dependency Parsing
We propose a method to improve the accuracy of parsing bilingual texts (bitexts) with the help of statistical machine translation (SMT) systems. Previous bitext parsing methods use human-annotated bilingual treebanks that are hard to obtain. Instead, our approach uses an auto-generated bilingual treebank to produce bilingual constraints. However, because the auto-generated bilingual treebank co...
متن کامل